1 Notes

  • In minitax, reads were NOT filtered for MAPQ
  • In minitax, results were NOT normalized to Genome Size
  • Unclassified reads were NOT excluded
  • Eukaryotes were NOT excluded from the analyis
  • The Gold Standard (Theoretical composition) was: Zymo D3600
  • The taxonomic lineage of the Gold Standard was taken from NCBI

1.1 Zymo Gold Standard composition

Limosilactobacillus fermentum in the theoretical composition was changed to Lactobacillus fermentum

2 Detection Statistics based on Taxa presence/absence on different levels

  • Precision= true positives /(true positives + false positives)
  • Recall= true positives /(true positives + false negatives)
  • F1=(2∗ precision ∗ recall)/(precision + recall)
  • F0.5=((1+0.52)∗ precision ∗ recall)/((0.52∗ precision)+ recall)

2.1 Species-level

2.2 Genus-level

3 Relative abundance of taxa at each taxonomic level

3.1 Phylum level

3.2 Order level

3.3 Genus level

3.4 Species level

4 Correlations

TIDY UP THE CODE HERE!

The correlations between the theoretical and observed composition are shown.

4.1 Phylum level

4.2 Order level

4.3 Genus level

4.4 Species level

4.5 Summarised r2 values

5 Chi-square tests

Is the observed distribution significantly different from the theoretical?

5.1 Species level

5.2 Genus level